Gifi Update Notes 

Jan de Leeuw 

First created on September 30, 2019. Last update on October 23, 2019 

library ("MASS") 
source ("gifiEngine.R") 
source ( "gifiUtilities.R") 
source ("gifiCoding.R") 
source ("gifiStructures.R") 
source ("gifiMonotone.R") 
source ( "splineBasis.R") 
source ("homals.R") 


1 The data 

The data for a gifiAnalysis is list of vectors (or factors), each of length n, possibly with NA’s. 
Each vector can be numerical, character, or logical. Thus the data can be an R dataframe. 
Each element of the data can be thought of as a variable. All variables are functions defined 
on the same domain with n elements. 

For a gifiAnalysis the data are partitioned into a number of sets of variables. Each set can 
contain both active and passive variables. The active variables participate in the actual 
computations of the gifiAnalysis, the passive (or supplementary) variables are used solely 
for post-processing and presentation purposes. 

2 The gifi structure 

Information about the variables is coded in gifi Variables. A gifiVariable is a structure with 
ten components, which are all objects that do not change during computation (parameters, if 
you like). These are: 

• data - vector or factor of n numbers, or chars, possibly with NA’s; 

• basis - a matrix with a B-spline basis, n x G; 

• degree - degree of the spline d 

• copies - number of copies l t ; 

• ordinal - true or false; 

• missing - multiple, single, deleted, or average; 

• active - true or false; 

• set - integer code for set; 

• name - name of variable; 

• type - type (categorical, polynomial, or splinical). 
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A gifi is simply a list of gift Variables. 

3 Gifi loss 

The gifi structure dictates the form of the loss function. Gifi loss, previously described in 
Gifi (1990) as meet loss, is defined as 

-1 m 

a(X, Z,A) = —J2 SSQ (X - £ ^A). (1) 

mp j =l ieij 

We use SSQ for the (unweighted) sum of squares of a numerical vector or matrix. The 
outer summation in (1) is over active sets, the inner summation over the active variables 
in the set (active sets are gifiSets with at least one active gifiVariable). Thus the index 
set A/" = {1, 2, • • • , N}, where N is the total number of active variables in the analysis, is 
partitined into the m index sets Ij, with Ij fl Ii = 0 and Uj=i Ij = A f. 

In the loss function: 

• X is n x p, object scores, observations by dimensions; 

• Hj is the basis, n x /q, observations by basis, known and fixed; 

• Zi are the coefficients, k t x /*, basis by copies; 

• Ai are the loadings, x p, copies by dimensions. 

In addition we impose the constraints: 

• X is centered e'X = 0 and orthonormal X'X = I. 

• If a variable is ordinal there are inequality constraints on the non-missing elements of 
T{ = H{Zi (or directly on the elements of ZQ. 

4 The xGifi Structure 

In the same way there is an xGifi Variable and an xGifi. Once again, an xGifi is a list of 

xGifiVariables. 

The xGifiVariable has the information about the variable that changes during the com¬ 
putations of the gifiAnalysis, the estimates that are updated in each iteration. It has five 
components: 

• transform 7) = HiZ{, orthogonalized, observations by copies, n x Ip, 

• loadings A*, copies by dimensions, l t x p\ 

• scores S) = T)Aj = HiZjAi, observations by dimensions, n x p; 

• quantifications (), = Z^A^ degrees by dimensions, k t x p\ 

• coefficients Z l} basis by copies, ki x 

In previous versions of the gifiSystem a different terminology was used. The quantifications 
Qi were restricted to be of rank less than or equal to /j. There was no notion of copies, 
which assumes or implies a variable enters into the gifiAnalysis more than once, and there 
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was no explicit decomposition Q t = AA maintained. The notion of copies is more flexible 
than the original approach. 

There are some alternative ways to write Gifi loss that will come in handy. If we concatenate 
the transforms T, = II, Z, horizontally and the loadings /l.,. vertically we can write 

-i m 

a(X, Z, A) — — y SSQ (X - T, A). (2) 

rrip “ 

We can also concatenate the quantifications Qj vertically and write 

-i m 

a(X, Z,A) = —y SSQ (X - HjQj). (3) 

rrip 

This is basically what is used in the original gifiSystem, with rank constraints of the Q*. 

And finally we can concatenate the bases horizontally and make Zj the direct sum of the A 
in set j. Thus Zj is block diagonal, with the A as diagonal blocks. Then 

-1 m 

a(X, Z,A) = — y SSQ (X - H.Z.AA. (4) 

mp “l 

The following diagram may be helpful. Or not. 
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5 Sanity 

Before the analysis starts the gihVariables and gihSets are checked for sanity. If on eof the 
following conditions is violated the analysis stops with the corresponding error message. 

• A variable cannot be completely missing. 

• A gihAnalysis needs more than one active set. 

• A gihAnalysis needs more than one active variable. 

• Sets need to be coded using consecutive integers. 

• Each basis is non-negative, with rows that add up to one. 

• A passive variable must be the only variable in its set. 

6 The Gifi engine 

We minimize Gifi loss using alternating least squares, combined with majorization. The 
gifiEngine function The alternation is over X , the Z t and the A*. 

The orthonomality constraints on X are simple to handle, and minimizing over X for fixed 
scores Sj = HiZiAi is an orthogonal Procrustus problem. The engine has the option to use 
QR decomposition instead of Procrustus, which also gives a valid ALS method. 

Because A is unconstrained, minimizing over the Aj for given X and Tj = HjZj is a simple 
least squares problem, which can be solved for each set separately. Because our least squares 
problems are often singular or ill-conditioned, we use the Moore-Penrose inverse function 
ginv() from the MASS package. 

The most complicated part is solving for the Z\ for given A and X, and that is where 
majorization conies in. We outline the majorization theory for minimizing gihLoss in an 
appendix. There are three possible ways to majorize discussed there, but only one of them is 
implemented in the gifiEngine. 

7 Active and Passive Variables 

Passive variables are handled in two stages. In the first stage we minimize the loss function 
(1) over X , Zi, and Aj. Then in the second stage we minimize 

SSQ (X - HiZiAi) (5) 

separately for each passive variable, using the optimal X from the first stage. 

Fitting passive variables is actually implemented in the main iteration loop of the gifiEngine, 
so we do not really need two stages. Each passive variable defines its own set, in which it is 
the only variable. And these passive sets do not contribute to the loss and to the alternating 
least squares step that updates X. 
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8 Choice of Basis 


Note that in earlier versions of the gifiSystem all data were categorical. The basis always 
was an indicator matrix. In this version the basis is either categorical, or a B-spline basis 
(De Boor (2001)). For a B-spline basis the user has to give a vector of interior knots. The 
exterior knots are always chosen to be the maximum and minimum of the data values, 
which implies that for B-spline basis the data must be numerical. There can be missing data, 
because the basis is only constructed for the non-missing data. After basis construction the 
various missing data options are used to complete the basis to a non-negative basis-matrix in 
which all rows add up to one. For categorical data, which do not have to be numerical, an 
indicator matrix is used, which is also non-negative with rows adding up to one. 

In actual computation we use a reduced basis, which is obtained from the basis matrix by 
columnwise centering and then leaving off the last, column. Columnwise centering has the 
effect that all rows now add up to zero and the basis is singular. Leaving off the last column 
can eliminate that singularity, and still gives a basis for the space of centered vectors in the 
space spanned by the original basis. Moreover, a centered vector is a linear combination of the 
original basis vectors with increasing coefficients if and only if it also is a linear combination 
of the reduced basis vectors with increasing coefficients. This is important for our treatment 
of ordinal variables. 


9 Missing Data 

For each variable there are four options for handling missing data. They are defined in terms 
of the bases of the variables, and they preserve the non-negativity and unit-row-sum property 
of the bases. 

• single add a single 0/1 column to the basis with 1 for missing. 

• multiple add a 0/1 column to the basis for each missing observation, i.e. append an 
identity matrix. 

• average for missing observations, use a basis row with all elements equal to the mean 
of the non-missing rows. 

• random for missing observations, use basis rows by sampling with replacement from 
the non-missing rows. 

10 Ordinal Restrictions 

Monotone splines De Leeuw (2017) 

If a variable has more than one copy (in original Gffi terminology this means the variable a 
multiple quantification) then we require only the first r copies are constrained ordinally.. 

Because of the indeterminacy in the product S) = HiZjAi, the requirement that r copies are 
monotonic with the data is actually equivalent to the requirement that there are r monotone 
directions in the column space of the basis (and these directions can be highly correlated). 
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Busing 


11 Wrappers 

The following wrappers, and their defining properties, are included: 

• homals: one variable per set, all variables ndim copies 

• princals: one variable per set, all variables one copy 

• primals: one variable per set, all variables one copy, ndim = 1 

• canals: two sets, all variables one copy 

• morals: two sets, one set has a single variable with one copy, ndim = 1 

• criminals: two sets, one set has a single variable with ndim copies 

• addals: two sets, one set one variable with one copy, other sets variables have ndim 
copies, ndim = 1 

For wrappers such as canals, princals, morals, addals, criminals the numerical output (the 
results returned by the wrapper functions) should look as if ordinary regression, PCA, CDA, 
etc. was applied to the transformed data. 

For this the results in Appendix 1 are helpful. We use the fact that the classical multivariate 
techniques can all be formulated as finding eigenvalues and eigenvectors of the generalized 
eigenvalue problem Cx = mXDx , where C = T'T and D = (BJL\ Tj. Here T is the horizontal 
concatenation of the Tj = HjZj. 

Thus, after convergence of the gifiAnalysis, each wrapper recovers corresponding classical 
multivariate analysis results in their familiar form. 

The wrappers far from exhaust all possible gifiAnalyses. Using the two functions makeGifiQ 
and gifiEngineQ users can write their own wrappers, or do perform analyses that deviate 
from the standard wrappers (such as analyses with variables having different numbers of 
copies within a set). Here is an example, 

data (sleeping, package= "homals") 
nobs <- length (sleeping[[1]]) 
nvar <- length (sleeping) 
ndim <- 2 
set.seed <- NULL 

x <- ortho (center (matrix (rnorm (nobs * ndim), nobs, ndim))) 
g <" 

makeGifi ( 
sleeping, 

knots = list (c( -5, 0,5) , c(500, 1000, 1500, 2000, 2500) , 100*1 :7, NULL) , 

degrees = c (2, 2, 2, -1), 

copies = c (1 , 1 , 1 , 2) , 

ordinal = c (0, 0, 0, 0), 

missing = rep ("r", 4), 

active = rep (TRUE, 4), 
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names = names (sleeping), 
sets = c(l, 1, 2, 1) 

) 

f <- xGifi(g, x) 
h <- 

gifiEngine( 
g> 

ndim = ndim, 
use_qr = FALSE, 
itmax = 1000, 
eps = le-6, 
seed = 123, 
verbose = TRUE 

) 


12 Appendix: Three problems 

Consider the function, called meet-loss by Gifi, 

m 

o(Y, B) = Y. SSQ (X - TjAj). (6) 

j = 1 

Here the Tj are given n x kj matrices, and we want to minimize cr over the n x p matrix X 
and the m kj x p matrices A y Obviously we need some sort of normalization to make this 
problem interesting, because otherwise X = 0 and Aj = 0 trivially solve the minimization 
problem. The three problems we discuss in this appendix use three different normalizations. 

First we simplify the notation. Define Dj = Tjfy , and the direct sum D = ©”Li Dj. Thus D 
is block diagonal, with the Dj as the diagonal blocks. Also define 


and 


Then 


T = 
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A 


A\ 


a(X , A) = m tr X'X - 2 tr X'TA + tr A'DA. 


12.1 First Problem 

The first problem we discuss is minimization of a over A" constrained by mX'X = I, with no 
constraints on A. 
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If mX'X = I then 


min a(X, A) = p - tr X'TD+T'X , 

where the minimum is attained for A = D + T'X, with D + the Moore-Pcnrose inverse. It 
follows that problem 1 is solved by choosing A" equal to the eigenvectors corresponding with 
the p largest eigenvalues of TD + T'X = mA r $, normalize them by mX'X = /, and then 
setting A = D + T'X. 

12.2 The Second Problem 

The second problem has no constraints on X, but requires A'DA = I. If A’DA = / then 

nrincrfX, A) = p ——tr A'CA, 
y m 

where C = T'T. The minimum is attained for X = p-T A. Problem 2 is solved by finding the 
eigenvectors corresponding with the p largest eigenvalues of CA = normalize them 

by A'DA = /, and then setting X = -p-TA. 

12.3 The Third Problem 

The third problem is to use both mX'X = / and A'DA = I. Then 

a(X,A) = 2(p-tr X'TA ). 

Problem 3 is solved by finding the left and right singular vectors corresponding with the 
largest singular values of TA = mX A and T'X = DAA, both appropriately normalized. 

It follows that $ = d' = A 2 , and the solutions to the three problems are essentially the same, 
except for the scaling of dimensions. If A with A'DA = I and X with mX'X = / solve the 
singular value problem 3, then X and D + T'X = AA solve problem 1, and A and p-TA = XA 
solve problem 2. 

13 Appendix: Majorization 

In this appendix we consider the problem of minimizing 

<y(Zi) = SSQ (X - HjZjAj) 

over Zj, where Hj has the Hi for % G Ij concatenated horizontally, the Aj have the Ai 
concatenated vertically, and the Zj is the direct sum of the Zj. Also define the transforms 
Tj = HjZj, which have the Tj = II, Z, concatenated horizontally, and the quantifications 
Qj = ZjAj, which have the Qi = ZiAi concatenated vertically. 

We start with expanding cr around the current best solution Zj. Define the residuals 
Rj = X — HjZjAj. Then 

a(Zj) = SSQ(X - Hj{Zj + (Z, - Z,)}A) = SSQ(R, - H(Z, - Z,)A,) 

= a(Z t ) - 2 tr (Z, - Z^'H'^A) + tr (Z, - ZpH'^Z - Z)^ 

From here we can go at least three ways. 



13.1 First Majorization 

The first majorization uses the result that for any positive semi-definite F and G of the same 
order we have tr FG < A max (F)tr G, where A max (F) is the largest eigenvalue of F. The proof 
is simple, using the eigen-decomposition F = KAK'. 

tr FG = tr KAK'G = tr A K'GK < A max (F)tr K'GK = A max (F)tr G. 

We apply this result to our minimization problem. Suppose Kj is the largest eigenvalue of 
AjAj. Then 

cr(Zj) < cr(Zj) — 2 tr ( HjZj — HjZj)'R jAj + k.j tr ( HjZj — HjZj)' [HjZj — HjZj ) 

This gives the majorization function that must be minimized over Z. By completing the 
square this can be done by minimizing SSQ ( HjZj — Uj), where U 3 = HjZj + Kj 1 RjA'j. 

If we majorize in this way for every set we obtain the majorizing function 

m 

EE SS Q (HiZi-Ui), 

j =i *e/j 


with Ui = HiZi + FJ 1 E A' t . Note that Rj, and thus U t , depends on the current best scores 
of the other variables in the set. Nevertheless it is fine from the algorithmic point of view 
to update the Rj after all Z { have been adjusted (think about, the distinction between the 
Gauss-Jordan and the Gauss-Seidel method for iteratively solving linear equations). 

13.2 Second Majorization 

Alternatively, we can use another inequality. If F and G are two positive semi-definite 
matrices, then for any F for which the matrix product E'FEG is defined tr E'FEG < 
A m ax(F)A max (G)SSQ(F). The proof uses e = vec(F) and the Kronecker product. 

tr E’FEG = e\F ® G)e < A max (F ® G)e'e = A max (F)A max (G)SSQ(F). 

If Uj is the largest eigenvalue of H'jHj then 

a(Zj) < cr(Zj) - 2 tr (Z f - ZjYH'RjA'j + k/i SSQ (Z 3 - Z 3 ) 

and this majorization function can be minimized by minimizing SSQ (Zj — V 3 ), with V 3 = 

Zj + "7V7 1 ir , l \i v :r 

The second majorization minimizes 


E E SS Q (z. - k). 


j =i ieij 


with Vi = Zi + k 3 l /ij 1 H-RjAY 
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13.3 Third majorization 

There is a third majorization, which also may be useful. We now expand around Qi = ZiAi. 
Using the same reasoning as in the other two majorizations, we arrive at 

m 

EE SS Q frAi-Yd. 

j =i ieij 

with Yi = ZiAi + // ' H-Rj and R :I = X — HjZjAj. 

13.4 Comparison 

The first majorization makes most sense when the ordinal constraints are on the columns of 
T % = HiZi , the second one when they are on the columns of Z % . Of course in general T r will be 
much larger than Z*, so the second and third majorization optimize in much smaller spaces. 
The third majorization is particularly useful if there are no ordinal constraints, because 
minimizing over ZiAi is just a singular value problem, which is usually small (the minimum 
of the number of columns in the basis and the dimensionality). For both the second and 
third majorization we have to compute the ^ij , but they do not change over iterations, so 
that only has to be done once. 

14 Appendix: Copy Expansion 

If we adopt the modification suggsted in this appendix we may have to change the basic 
structures. If variable i has l copies then S = HZ can be written as 

i 

s = E Hz s^s, 

s =1 

or, in words, having a variable with / copies is exactly the same as having l variables with 
one copy, where all l variables have the same basis. In that sense the notion of copies (or 
the older notion of rank) is superfluous, and our gikStructures could reflect that. 

The main reason to use copies is to save storage (only one basis for all copies). In addition 
we must make sure the different coefficients z s corresponding with the same basis H are, and 
remain, linearly independent. One alternative way of handling copies is to define a gifiCopy, 
and define a gifiVariable as a list of gffiCopies. Or we can leave the gifiVariable as is, and 
treat each copy as a different xGihVariable. 

data (hartigan, package = "homals") 
h <- homals (hartigan) 

0.5157409 in 286 

hortigan <- 
data.frame( 
hartigan[[!]] , 
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hartigan[ [1] ], 
hartigan[ [2] ], 
hartigan[ [2] ], 
hartigan[ [3] ], 
hartigan[ [3] ], 
hartigan[[4]], 
hartigan[[4]], 
hartigan[[5]], 
hartigan[[5]], 
hartigan[ [6] ], 
hartigan[ [6] ] 

) 

x <- ortho (center (matrix(rnorm(48) , 24, 2))) 
g <" 

makeGif i( 
hortigan, 

knots = rep (NULL, 12), 
degrees = rep(-l, 12), 
ordinal = rep (0, 12), 
copies = rep(l, 12), 
missing = rep ("r", 12), 
active = rep (TRUE, 12) , 
names = names (hortigan), 

sets = c(l, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6) 

) 

f <- xGif i(g, x) 
h <- 

gifiEngine( 
g> 

ndim = 2, 
use_qr = FALSE, 
itmax = 1000, 
eps = le-6, 
seed = NULL, 
verbose = FALSE 

) 

0.529083 in 120 

15 Appendix: Partials 

x <- matrix (rnorm(50) , 25, 2) 
y <- matrix (rnorm(25) , 25, 1) 
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cxy <-cancor (x, cbind (x, y)) 
names (cxy) 

## [1] "cor" "xcoef" "ycoef" "xcenter" "ycenter" 

16 Appendix: Upgrades 

In the future I will use basically the same approach using a variable structure (will not 
change much from the gifiVariable) and an engine to implement some upgrades. First, we will 
add burtEngine, which will allow us to add (generalized) correspondence analysis. Then, 
gifiEngine will give birth to nextEngine, which implements the theory of De Leeuw (2004). 
We add orthoBlock to the possible types of a variable and require a pattern matrix to code 
the SEM. This allows us to add pathals, dynamals, lisrals, lineals, MIMIC, factals etc. Then, 
we will add aspectEngine, which implements De Leeuw (1988). 
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